Skip to content

bzImage / initrd support (v2) #670

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

tim-deegan-aws
Copy link
Contributor

This is a rebase of #428 - after various repo shuffles I'm not able to update that PR with the new version.

It adds support for loading bzImage kernels and initrds as an alternative to ELF64 kernels. It's slower than the vmlinux format because the kernel must decompress itself, but useful for trying kernels that I didn't build myself.

For now, it loads the initrd at a fixed address; it would be better to have a way of finding a free area of memory after the kernel has loaded.

@alxiord
Copy link

alxiord commented Nov 28, 2018

Changes look good. Please add integration tests as these are new functionalities. Add an entry in the changelog too as they're customer facing.

@tim-deegan-aws
Copy link
Contributor Author

I am working on integration tests -- it took a little longer to get the env up and running than I expected, but I should have something this week.

@tim-deegan-aws
Copy link
Contributor Author

Rebased, and added integration tests and changelog entries. The integration tests show as 'error' because the matching images are not in the spec.ccfc.min bucket so the tests can't be run. They pass for me when running with TEST_MICROVM_IMAGES_S3_BUCKET=tdeegan.fc.test - that bucket has extra images img/bzimage/ and img/initrd/ . What's the right way to merge those in to spec.ccfc.min?

@tim-deegan-aws
Copy link
Contributor Author

Noop push to trigger the tests again.

@@ -110,6 +110,7 @@ s3://<bucket-url>/microvm-images/
<microvm_test_image_folder_n>/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're touching/fixing this file, please also update the S3 bucket name from spec.firecracker to spec.ccfc.min, also the tree has changed from:

s3://<bucket-url>/microvm-images/

to

s3://<bucket-name>/img/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, updated.

acatangiu
acatangiu previously approved these changes Dec 14, 2018
Copy link
Contributor

@acatangiu acatangiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good. I was expecting the boot time to be larger when using bzimage, but not that much larger 😄 .

@raduweiss
Copy link
Contributor

Not matrial for this PR, but what compression format is the new image in? I think some formats might actually be faster than the uncompressed binary.

alexandruag
alexandruag previously approved these changes Dec 18, 2018
Tries loading as ELF first, then falls back to bzImage.
bzImage is slower to boot than ELF, because the kernel has to
decompress itself.

Added an init_size field to the bootparams block, which the
decompressor uses to know how much space it has to work with.  Made it
big enough for a moderate sized kernel.  We really ought to be extracting
the bootparams block from the kernel instead.

Signed-off-by: Tim Deegan <[email protected]>
Plumb through an optional initrd path from the REST API, and load it
at a fixed address.

Signed-off-by: Tim Deegan <[email protected]>
Add plumbing for test VMs that have initrd and no rootfs, and
test the boot time of both bzImage and bzImage+initrd configurations.
They need longer timeouts than the vmlinux test because they both
involve decompression at boot time.

Signed-off-by: Tim Deegan <[email protected]>
@tim-deegan-aws
Copy link
Contributor Author

Not matrial for this PR, but what compression format is the new image in? I think some formats might actually be faster than the uncompressed binary.

The kernel and initrd are both gzip'd. I didn't try other compression formats, but I agree that something like lz4 might go faster.

@tim-deegan-aws
Copy link
Contributor Author

Thanks all for the reviews!

I've rebased against current master. The only conflicts were in the changelog and a line of context in test_boottime.py; no other changes since last time.

@raduweiss
Copy link
Contributor

Tim, apologies for showing up here at this late stage, but in the last days I was thinking that we should stick to our minimalism tenet here, and only add additional supported load protocols when there’s a clear requirement form a customer / user. Since right now there does not seem to be demand for this, I don't this is something we should merge at this time.

So, to be consistent with our mission, there are two options:

  1. Drop this now and add it when there is demand for it.
  2. Add it now, but as a feature (like the current experimental vsock).

Again, sorry for waking up to this so late in the game, I didn't think about it until recently.

@tim-deegan-aws
Copy link
Contributor Author

Totally understood. No point in having code in there that nobody needs. I'll drop this for now and it can be revived if there's more demand.

@rgooch
Copy link

rgooch commented Jan 9, 2019

I would like to see this feature.

@raduweiss
Copy link
Contributor

@rgooch is there a reason the rootfs drive option doesn't work for you?

@rgooch
Copy link

rgooch commented Jan 10, 2019

@raduweiss: I have OS images which contain an initrd. I would like to be able to boot those with Firecracker instead of QEMU.

@raduweiss
Copy link
Contributor

The thing with additional boot process features like initrd is that it goes against Firecracker's commitment to have super-low start-up times [1]. And, I understand your use case, I think this is one pace where we need to trade off convenience for performance. So far, we haven't seen a workload that is blocked on initrd, as in, there's no other reasonable way to boot the guest.

[1] https://github.com/firecracker-microvm/firecracker/blob/master/SPECIFICATION.md

@rgooch
Copy link

rgooch commented Jan 10, 2019

Yes, I agree it can be worked around. However, we have a lot of people building conventional VM images which use initrd. It would be a lot of engineering work for them to get off of initrd, especially considering that basically all the major Linux OS vendors depend on initrd. It's a major effort to get people to start with something else.

I realise that you want to keep Firecracker small. That said, with initrd support, I could boot most Linux VMs with Firecracker rather than QEMU. So close...

@rn
Copy link
Contributor

rn commented Jan 15, 2019

@raduweiss I agree that we'd like to keep Firecracker (the binary) small and fast and bzImage/initrd will increase boot time due to need of decompression. However, I also view Firecracker, the repository a bit like a toolbox.

In general, what do you think of including bzImage/initrd support as an optional/non-default feature which can be enabled at compile time? Booting bzImage/initrd is just super common and very convenient as they are easy to create/extract.

@rgooch
Copy link

rgooch commented Jan 15, 2019

I like the toolbox approach. Another possibility would be a plugin approach.
I agree that bzimage+initrd is very common and it's very hard to avoid that without building a custom OS.

@rn
Copy link
Contributor

rn commented Jan 15, 2019

Converting a bzImage to an elf image for firecracker is pretty easy. This sctipt does the job. The inirtdis just a cpio archive, so extracting and packaging it into a disk image should be pretty straight forward too.

These are extra steps so having support for bzIMage/initrd optional should be sufficient.

@rgooch
Copy link

rgooch commented Jan 15, 2019

Creating a file-system (ext4) image isn't cheap. I measured around 130 ms to mkfs.ext4 a 80MiB FS. There's another 100+ ms to unmount the FS. This was using losetup to create a block device from the file. I would rather not have to create a FS image to feed in an initrd.

@rn
Copy link
Contributor

rn commented Jan 16, 2019

I wasn't suggesting to create the root fs on the fly, but more to do a one-off conversion if you have a initrd. Do you have a specific use case in mind where it is necessary to convert it on the fly?

@rgooch
Copy link

rgooch commented Jan 16, 2019

We have a model where we don't store disc images, instead we store the directory structure and each regular file as a unique object. This is more efficient when there are many different images which have a hierarchy of derivations. When a VM is created, the root volume is created on-the-fly. The size of the volume is the size of the image plus a specified amount of free space. The SmallStack Hypervisor performs this on-the-fly root volume creation.

Pre-computing a root volume would imply that the volume (and thus FS) would need to be resized when the VM is created. A bigger issue is that we'd have to pre-compute all these root volumes and that would be expensive to store and distribute, as there would be no de-duplication of file objects.

@andreeaflorescu
Copy link
Member

Can we test how fast would it boot with lz4 images? I don't see any harm in having this support in Firecracker if it makes sense from a boot time perspective (as in it has similar boot times that we currently achieve with elf).

@rn
Copy link
Contributor

rn commented Jan 24, 2019

I was going to run some tests but haven't got around to it yet.

There is another option, which is to decompress the kernel as part of the load (rather than boot). If I understand the code right, this PR currently puts the bzImage into memory and then it get's decompressed as part of the boot. But we could also parse the image based on the format specified here, extract the vmlinux image from it, and load that. This way the boot time should not be affected, but the REST call for passing the image might be slower. A while back I implemented a parser/extractor for bzImages in Go which may provide a hint on how to do this in Rust.

@rgooch
Copy link

rgooch commented Jan 24, 2019

From my perspective, support for bzImage format would be convenient, but initrd support is more important to me. I can decompress and unpack the bzImage externally. I can even keep a small cache of extracted vmlinux files to amortise the unpacking cost.

@marcov
Copy link
Contributor

marcov commented Aug 30, 2019

Hi, I am bumping this PR, since I am seeing this feature listed in the 2020 Roadmap.

The initrd has the big advantage of allowing using a generic kernel, with few CONFIG_'s built in, with no need to support different rootfs filesystems.

@tim-deegan-aws would you consider resuming this? If you need I'd be glad to help / spend some time on this to have it upstream.

@andreeaflorescu
Copy link
Member

@marcov this PR also adds support for bzImage. For now we only agreed on adding initrd path support.

I am not sure @tim-deegan-aws is still active on GitHub, so I wouldn't wait for too long on a response.

@rgooch
Copy link

rgooch commented Sep 2, 2019

@andreeaflorescu: Are you saying that what's stalling the addition of initrd support from this PR is the bzImage support?
I a PR was submitted with only initrd support, would be accepted more easily?

@andreeaflorescu
Copy link
Member

@rgooch Yap, a PR that only adds initrd support would be easily merged because that's what we also have on our roadmap.

@marcov marcov mentioned this pull request Sep 6, 2019
7 tasks
@marcov
Copy link
Contributor

marcov commented Sep 6, 2019

Opened a dedicated PR for initrd support here: #1246

@andreeaflorescu
Copy link
Member

@marcov thanks, we will take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants